Policy Search using Paired Comparisons
نویسندگان
چکیده
Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the ‘overfitting’ effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution.
منابع مشابه
Direct Policy Search using Paired Statistical Tests
Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng & Jor...
متن کاملHow to Analyze Paired Comparison Data
Thurstone’s Law of Comparative Judgment provides a method to convert subjective paired comparisons into one-dimensional quality scores. Applications include judging quality of different image reconstructions, or different products, or different web search results, etc. This tutorial covers the popular Thurstone-Mosteller Case V model and the Bradley-Terry logistic variant. We describe three app...
متن کاملFitting loglinear Bradley-Terry models (LLBT) for paired comparisons using the R package prefmod
This paper aims at introducing the R package prefmod (Hatzinger, 2009) which allows the user to fit various models to paired comparison data. These models give estimated overall rankings of objects or items where each subject (respondent/judge) makes one or more comparisons between pairs of objects (items). The focus is on the loglinear Bradley-Terry (LLBT) model, the loglinear formulation of t...
متن کاملThurstone's Case V model: A structural equations modeling perspective
Modeling how we choose among alternatives, or more generally, modeling preferences, is one of the core topics of study in Psychology. Preferences can be studied experimentally using a variety of procedures, one of the oldest being the method of paired comparisons. This method remains quite popular in areas such as psychophysics and consumer psychology. For a good overview of the method of paire...
متن کاملEnd-to-End Training of Deep Visuomotor Policies
Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-toend provide better performance than training...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 3 شماره
صفحات -
تاریخ انتشار 2002